STAT W4701 Exploratory Data Analysis and Visualization
Spring 2016
Project 3
Group 9: The Brockovich-es
Section 1: Motivation
The recent news reports of contamination in drinking water posing severe health hazards to the residents on Flint, MI provided the motivation for our team’s project.
Section 2: Data Sources
We relied on public data sources for the purpose of this project. We list our data sources below and what we used each one for.
U.S. Geological Survey Data
From this source, we got information about what health hazards are caused by what contaminants. This was our data source for the Sankey plot.
(We had to take the text from this page and put it in a csv file and clean it up before we could get it ready for plotting. We are including our cleaned file as part of our project in order to make our plot quickly replicatable)
Data Source
Unregulated Drinking Water Contaminants data
The Environmental Protection Agency monitors presence of various chemicals in our water systems even if they aren’t regulated yet.
This data allowed us to make the plots in Section 7 and 8.
Source 1
Source 2
EPA 2010 data on reported health violations from water systems
Section 3: Plot of testing sites by type and drainage area.
In our first plot, we look at all the sites around the country where water quality is being tested. Some of these sites are more important than others, because the drainage area of the water sources are larger and hence they serve a larger area. We split all the sites into quantiles of the drainage area and the larger sites are shown with a larger size icon. In addition, we also differentiate the categories of these sites with different colors.
We notice that the green dots in the Mississippi basin show up as belonging to the Inland Rivers category (just as we would expect) and the red dots along the coast are the Coastal Rivers.
Section 4: Plot of testing sites by type and drainage area (with Clustering Options added)
This plot is essentially the same as the previous one, but we enabled Clustering options to see how the test sites were clustered around regions of the country.
Section 5: Trellis plot of concentration of nutrients/contaminants that were found in the testing sites
0-60: soft
61-120: moderate hard
121-180: hard
181+: very hard
The hardness of water will be reported in grains per gallon, milligrams per liter (mg/l). The reason for setting an upper limit on hardness is that hard water can cause calcium carbonate scale deposits in automated watering systems, which can lead to drinking valve leaks and other operational problems. According to the Water Quality Association, water is considered “hard” when the measured hardness exceeds 120 mg/L.
0.00 - 0.025 mg/L: the level in uncontaminated lakes
0.025 - 0.05mg/L: level at which plant growth is stimulated
0.05- 0.1 mg/L: maximum acceptable to avoid accelerated eutrophication
0.1+ mg/L: accelerated growth and consequent problems
If too much phosphate is present in the water the algae and weeds will grow rapidly, may choke the waterway, and use up large amounts of precious oxygen (in the absence of photosynthesis and as the algae and plants die and are consumed by aerobic bacteria.) The result may be the death of many fish and aquatic organisms.
The United States EPA, under the authority of the Safe Drinking Water Act (SDWA), has set the Maximum Contaminant Level Goal (MCLG) for nitrate at 10.0 mg/L and for nitrite at 1.0 mg/L (measured as nitrogen, N). This is the health-based goal at which no known or anticipated adverse effects on human health occur and for which an adequate margin of safety exists. Infants below the age of six months who drink water containing nitrate in excess of the MCL could become seriously ill and, if untreated, may die. Symptoms include shortness of breath and blue-baby syndrome.

Total nitrogen refers to the combination of both organic and inorganic N. While it can be measured directly in the laboratory, it is also commonly approximated by adding TKN and nitrite+nitrate-N concentrations. Any level above 10 mg/L is harmful for health.
Section 6: Sankey plot: Contaminant and disease connection
The U.S. Environmental Protection Agency (EPA) provides a list of contaminants whose concentration in drinking water it regulates. These contaminants are separated into six broad categories: Microorganisms (Viruses, Parasites, Bacteria), Disinfectants (Chlorine, Chlorine Dioxide), Disinfection Byproducts (Bromate, Chlorite), Inorganic Chemicals (Arsenic, Cyanide, Lead, Mercury, Nitrite, Nitrate), Organic Chemicals (Benzene, Vinyl Chloride, Acrylamide), and Radionuclides (Alpha particles, beta particles, Uranium). Each contaminant is given a maximum contaminant level (MCL), which is the largest amount permitted in drinking water. Levels above those thresholds can lead to many health problems and diseases, especially when exposed for a long period of time. For example, lead can cause high blood pressure in adults and developmental issues in children; radioactive particles increase the chances of cancer; and Benzene can lead to anemia. The following Sankey diagram maps each type of contaminant to the health disorders that they can potentially cause.
This plot condenses the health diseases into a smaller number of categories (ex. “Anemia” and “Increased Blood Pressure” now fall under “Blood Problems”, etc.).
Section 7: Aggregate plot of water quality by state
In this plot, we plot the aggregate of health violations that were reported in each state from community water systems and the percentage of state population that was affected by that water system contamination.
We tried to replicate the plot that we saw here .
This map is also interactive. If you move the mouse over each state, you will see that % of population that is affected by the health violation and the % of water systems in the state reporting those violations.
The darker colors indicate more unclean water. What is interesting to note is that South Dakota and Oklahoma have a lot of unclean water, and upon researching possible reasons as to why that could be, we found that
1. There are water test sites from U.S. Geological Survey data that we used for the first 2 plots in Section 4 and 5.
2. These states have some of the highest proportions of Native American populations in the country.
This finding implies a correlation between Native American state population proportion and water quality. How exactly Native Americans in those states are affected by water quality should be studied for future research.
Section 8: Unregulated Drinking Water Contaminants
In addition to the multitude of known and regulated substances, under the Safe Drinking Water Act (SDWA), the EPA runs a program called “Monitoring the Occurrence of Unregulated Drinking Water Contaminants”. Every 5 years, the EPA collects data for 30 new unregulated contaminants that do not currently have health-based standards, i.e. they are unregulated. The purpose of the program is to support the efforts in determining whether to regulate particular contaminants in the future in the interest of protecting public health.
Since 2001, the program has been monitoring large water systems as well as a representative sample of small public water systems serving less than 10,000 people. The data is stored in a nationally and publicly accessible database.
Unregulated and dangerous
In the 3rd round (2012-2016) of the EPA program, the occurrence of an industrial compound that was used for decades to make Teflon, an invisible toxic chemical known as POFA, was detected in 94 public water systems serving 6.5 million Americans in 27 different states.
This interactive plot shows the water systems found to be contaminated by PFOA according to EPA’s testing between 2012-2016, as well as the amounts of POFA found at each location.
This interactive plot shows the water systems found to be contaminated by each of the chemicals known as PFCs, including POFA, PFOS, PFNA, PFHxS, PFHpA, PFBS, according to EPA’s testing between 2012-2016.
Section 9: Conclusion
This project was inspired by reports of high levels of lead in drinking water in Flint, Michigan and Newark, New Jersey. Our report has shown that there are many other sources of contamination in drinking water, regulated and unregulated, and that these pollutants are found in a number of drinking water sources throughout the United States. We hope that our presentation inspires future research. We also hope that it motivates its readers to think critically about where they get their drinking water from and to hold officials accountable for providing clean drinking water, regardless of where in the world our readers live.